29 research outputs found
Energy flow polynomials: A complete linear basis for jet substructure
We introduce the energy flow polynomials: a complete set of jet substructure
observables which form a discrete linear basis for all infrared- and
collinear-safe observables. Energy flow polynomials are multiparticle energy
correlators with specific angular structures that are a direct consequence of
infrared and collinear safety. We establish a powerful graph-theoretic
representation of the energy flow polynomials which allows us to design
efficient algorithms for their computation. Many common jet observables are
exact linear combinations of energy flow polynomials, and we demonstrate the
linear spanning nature of the energy flow basis by performing regression for
several common jet observables. Using linear classification with energy flow
polynomials, we achieve excellent performance on three representative jet
tagging problems: quark/gluon discrimination, boosted W tagging, and boosted
top tagging. The energy flow basis provides a systematic framework for complete
investigations of jet substructure using linear methods.Comment: 41+15 pages, 13 figures, 5 tables; v2: updated to match JHEP versio
An operational definition of quark and gluon jets
While "quark" and "gluon" jets are often treated as separate, well-defined
objects in both theoretical and experimental contexts, no precise, practical,
and hadron-level definition of jet flavor presently exists. To remedy this
issue, we develop and advocate for a data-driven, operational definition of
quark and gluon jets that is readily applicable at colliders. Rather than
specifying a per-jet flavor label, we aggregately define quark and gluon jets
at the distribution level in terms of measured hadronic cross sections.
Intuitively, quark and gluon jets emerge as the two maximally separable
categories within two jet samples in data. Benefiting from recent work on
data-driven classifiers and topic modeling for jets, we show that the practical
tools needed to implement our definition already exist for experimental
applications. As an informative example, we demonstrate the power of our
operational definition using Z+jet and dijet samples, illustrating that pure
quark and gluon distributions and fractions can be successfully extracted in a
fully well-defined manner.Comment: 38 pages, 10 figures, 1 table; v2: updated to match JHEP versio
Disentangling Quarks and Gluons with CMS Open Data
We study quark and gluon jets separately using public collider data from the
CMS experiment. Our analysis is based on 2.3/fb of proton-proton collisions at
7 TeV, collected at the Large Hadron Collider in 2011. We define two
non-overlapping samples via a pseudorapidity cut -- central jets with |eta| <
0.65 and forward jets with |eta| > 0.65 -- and employ jet topic modeling to
extract individual distributions for the maximally separable categories. Under
certain assumptions, such as sample independence and mutual irreducibility,
these categories correspond to "quark" and "gluon" jets, as given by a recently
proposed operational definition. We consider a number of different methods for
extracting reducibility factors from the central and forward datasets, from
which the fractions of quark jets in each sample can be determined. The
greatest stability and robustness to statistical uncertainties is achieved by a
novel method based on parametrizing the endpoints of a receiver operating
characteristic (ROC) curve. To mitigate detector effects, which would otherwise
induce unphysical differences between central and forward jets, we use the
OmniFold method to perform central value unfolding. As a demonstration of the
power of this method, we extract the intrinsic dimensionality of the quark and
gluon jet samples, which exhibit Casimir scaling, as expected from the
strongly-ordered limit. To our knowledge, this work is the first application of
full phase space unfolding to real collider data, and one of the first
applications of topic modeling to extract separate quark and gluon
distributions at the LHC.Comment: 31 pages, 24 figures, 1 table, 1 koal
Pileup Mitigation with Machine Learning (PUMML)
Pileup involves the contamination of the energy distribution arising from the
primary collision of interest (leading vertex) by radiation from soft
collisions (pileup). We develop a new technique for removing this contamination
using machine learning and convolutional neural networks. The network takes as
input the energy distribution of charged leading vertex particles, charged
pileup particles, and all neutral particles and outputs the energy distribution
of particles coming from leading vertex alone. The PUMML algorithm performs
remarkably well at eliminating pileup distortion on a wide range of simple and
complex jet observables. We test the robustness of the algorithm in a number of
ways and discuss how the network can be trained directly on data.Comment: 20 pages, 8 figures, 2 tables. Updated to JHEP versio
Learning to Classify from Impure Samples with High-Dimensional Data
A persistent challenge in practical classification tasks is that labeled
training sets are not always available. In particle physics, this challenge is
surmounted by the use of simulations. These simulations accurately reproduce
most features of data, but cannot be trusted to capture all of the complex
correlations exploitable by modern machine learning methods. Recent work in
weakly supervised learning has shown that simple, low-dimensional classifiers
can be trained using only the impure mixtures present in data. Here, we
demonstrate that complex, high-dimensional classifiers can also be trained on
impure mixtures using weak supervision techniques, with performance comparable
to what could be achieved with pure samples. Using weak supervision will
therefore allow us to avoid relying exclusively on simulations for
high-dimensional classification. This work opens the door to a new regime
whereby complex models are trained directly on data, providing direct access to
probe the underlying physics.Comment: 6 pages, 2 tables, 2 figures. v2: updated to match PRD versio
OmniFold: A Method to Simultaneously Unfold All Observables
Collider data must be corrected for detector effects ("unfolded") to be
compared with many theoretical calculations and measurements from other
experiments. Unfolding is traditionally done for individual, binned observables
without including all information relevant for characterizing the detector
response. We introduce OmniFold, an unfolding method that iteratively reweights
a simulated dataset, using machine learning to capitalize on all available
information. Our approach is unbinned, works for arbitrarily high-dimensional
data, and naturally incorporates information from the full phase space. We
illustrate this technique on a realistic jet substructure example from the
Large Hadron Collider and compare it to standard binned unfolding methods. This
new paradigm enables the simultaneous measurement of all observables, including
those not yet invented at the time of the analysis.Comment: 8 pages, 3 figures, 1 table, 1 poem; v2: updated to approximate PRL
versio
The Hidden Geometry of Particle Collisions
We establish that many fundamental concepts and techniques in quantum field
theory and collider physics can be naturally understood and unified through a
simple new geometric language. The idea is to equip the space of collider
events with a metric, from which other geometric objects can be rigorously
defined. Our analysis is based on the energy mover's distance, which quantifies
the "work" required to rearrange one event into another. This metric, which
operates purely at the level of observable energy flow information, allows for
a clarified definition of infrared and collinear safety and related concepts. A
number of well-known collider observables can be exactly cast as the minimum
distance between an event and various manifolds in this space. Jet definitions,
such as exclusive cone and sequential recombination algorithms, can be directly
derived by finding the closest few-particle approximation to the event. Several
area- and constituent-based pileup mitigation strategies are naturally
expressed in this formalism as well. Finally, we lift our reasoning to develop
a precise distance between theories, which are treated as collections of events
weighted by cross sections. In all of these various cases, a better
understanding of existing methods in our geometric language suggests
interesting new ideas and generalizations.Comment: 56 pages, 11 figures, 5 tables; v2: minor changes and updated
references; v3: updated to match JHEP versio
Deep learning in color: towards automated quark/gluon jet discrimination
Artificial intelligence offers the potential to automate challenging data-processing tasks in collider physics. To establish its prospects, we explore to what extent deep learning with convolutional neural networks can discriminate quark and gluon jets better than observables designed by physicists. Our approach builds upon the paradigm that a jet can be treated as an image, with intensity given by the local calorimeter deposits. We supplement this construction by adding color to the images, with red, green and blue intensities given by the transverse momentum in charged particles, transverse momentum in neutral particles, and pixel-level charged particle counts. Overall, the deep networks match or outperform traditional jet variables. We also find that, while various simulations produce different quark and gluon jets, the neural networks are surprisingly insensitive to these differences, similar to traditional observables. This suggests that the networks can extract robust physical information from imperfect simulations.Massachusetts Institute of Technology. Department of Physic
Pileup and Infrared Radiation Annihilation (PIRANHA): A Paradigm for Continuous Jet Grooming
Jet grooming is an important strategy for analyzing relativistic particle
collisions in the presence of contaminating radiation. Most jet grooming
techniques introduce hard cutoffs to remove soft radiation, leading to
discontinuous behavior and associated experimental and theoretical challenges.
In this paper, we introduce Pileup and Infrared Radiation Annihilation
(PIRANHA), a paradigm for continuous jet grooming that overcomes the
discontinuity and infrared sensitivity of hard-cutoff grooming procedures. We
motivate PIRANHA from the perspective of optimal transport and the Energy
Mover's Distance and review Apollonius Subtraction and Iterated Voronoi
Subtraction as examples of PIRANHA-style grooming. We then introduce a new
tree-based implementation of PIRANHA, Recursive Subtraction, with reduced
computational costs. Finally, we demonstrate the performance of Recursive
Subtraction in mitigating sensitivity to soft distortions from hadronization
and detector effects, and additive contamination from pileup and the underlying
event.Comment: 38+35 pages, 20 figures. PIRANHA algorithm code available at
http://github.com/pkomiske/Piranh